Designing a Rule Based Stemmer for Afaan Oromo Text

نویسندگان

  • Ermias Abebe
  • Debela Tesfaye
چکیده

Most natural language processing systems use stemmer as a separate module in their architecture. Specially, it is very significant for developing, machine translator, speech recognizer and search engines. In this work, a stemming system for Afan Oromo is presented. This system takes as input a word and removes its affixes according to a rule based algorithm. The result of the study is a prototype context sensitive iterative stemmer. Error counting technique was employed to evaluate the performance of this stemmer. The errors were analyzed and classified into two different categories: under stemming and over stemming errors. For testing purpose corpus which is collected from different public Afaan Oromo newspapers and bulletins is used. Newspapers, bulletins and public magazines are considered as consisting different issues of the community: social, economical, technological and political issues. This will reduce the probability of making the corpus biased toward some specific words that do not appear in everyday life. According to the evaluation of the experiments, it can be concluded that an overall accuracy of the stemmer is encouraging which shows stemming can be performed with low error rates in high inflected languages such as Afan Oromo.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Oromo-English Cross-Language Information Retrieval

This paper reports on the first Oromo-English CLIR system that is based on dictionary-based query translation techniques. The basic objective of the study is to design and develop an OromoEnglish CLIR system with a view to enable Afaan Oromo speakers to access and retrieve the vast online information sources that are available in English by using their own (native) language queries. We describe...

متن کامل

Oromo-English Information Retrieval Experiments at CLEF 2007

In this paper we describe our Oromo-English retrieval experiments that we have conducted at IIITHyderabad (India) and submitted to the ad hoc retrieval task of CLEF 2007. We participated in the bilingual subtask of CLEF campaign for the second time by designing and submitting four official runs. The experiments differ from one another in terms of topic fields used for query construction and the...

متن کامل

Oromo-English Information Retrieval Experiments at CLEF 2006

(LTRC) of IIIT­Hyderabad has participated this year, for the first time, in CLEF campaign. We took part in ad hoc track by conducting various bilingual information retrieval experiments for three different languages: Oromo­English, Hindi­English, and Telugu­English. In this paper we describe our Oromo–English information retrieval experiments at CLEF'06. The main objective of all Oromo­English ...

متن کامل

Special Issue on Artificial Intelligence IJACSA Special Issue Guest Editor

The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. Unigram model is used to understand word ambiguity in the language, while bigram model is u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010